{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9cc3b6de",
   "metadata": {},
   "source": [
    "# Lesson 4: Evaluate the Tuned Model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f596bbd",
   "metadata": {},
   "source": [
    "#### Project environment setup\n",
    "\n",
    "- Install Tensorboard (if running locally)\n",
    "```Python\n",
    "!pip install tensorboard\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "adddea6a",
   "metadata": {},
   "source": [
    "### Explore results with Tensorboard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a9964a4-706b-4832-b4ab-9c668ba524e2",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "%load_ext tensorboard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84fd0dac-ca26-480b-975b-729ee01c09d7",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "port = %env PORT1\n",
    "%tensorboard --logdir reward-logs --port $port --bind_all "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4697495d-424a-4ad3-862a-995acfce49f0",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "# Look at what this directory has\n",
    "%ls reward-logs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15b1d22b-5266-4859-a24d-47d03a5ec6d8",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "port = %env PORT2\n",
    "%tensorboard --logdir reinforcer-logs --port $port --bind_all"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0b17b76-9840-44a4-a503-650f22a4d38a",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "port = %env PORT3\n",
    "%tensorboard --logdir reinforcer-fulldata-logs --port $port --bind_all"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ce32db1",
   "metadata": {},
   "source": [
    "- The dictionary of 'parameter_values' defined in the previous lesson"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5cf353d-1264-47f6-995a-8a8f1043e08a",
   "metadata": {
    "height": 268
   },
   "outputs": [],
   "source": [
    "parameter_values={\n",
    "        \"preference_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text_small/summarize_from_feedback_tfds/comparisons/train/*.jsonl\",\n",
    "        \"prompt_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/train/*.jsonl\",\n",
    "        \"eval_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/val/*.jsonl\",\n",
    "        \"large_model_reference\": \"llama-2-7b\",\n",
    "        \"reward_model_train_steps\": 1410,\n",
    "        \"reinforcement_learning_train_steps\": 320,\n",
    "        \"reward_model_learning_rate_multiplier\": 1.0,\n",
    "        \"reinforcement_learning_rate_multiplier\": 1.0,\n",
    "        \"kl_coeff\": 0.1,\n",
    "        \"instruction\":\\\n",
    "    \"Summarize in less than 50 words\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a8fb9a2",
   "metadata": {},
   "source": [
    "**Note:** Here, we are using \"text_small\" for our datasets for learning purposes. However for the results that we're evaluating in this lesson, the team used the full dataset with the following hyperparameters:\n",
    "\n",
    "```Python\n",
    "parameter_values={\n",
    "        \"preference_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text/summarize_from_feedback_tfds/comparisons/train/*.jsonl\",\n",
    "        \"prompt_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text/reddit_tfds/train/*.jsonl\",\n",
    "        \"eval_dataset\": \\\n",
    "    \"gs://vertex-ai/generative-ai/rlhf/text/reddit_tfds/val/*.jsonl\",\n",
    "        \"large_model_reference\": \"llama-2-7b\",\n",
    "        \"reward_model_train_steps\": 10000,\n",
    "        \"reinforcement_learning_train_steps\": 10000, \n",
    "        \"reward_model_learning_rate_multiplier\": 1.0,\n",
    "        \"reinforcement_learning_rate_multiplier\": 0.2,\n",
    "        \"kl_coeff\": 0.1,\n",
    "        \"instruction\":\\\n",
    "    \"Summarize in less than 50 words\"}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e30aeeb3",
   "metadata": {},
   "source": [
    "### Evaluate using the tuned and untuned model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "163d5544-a1fe-4c22-8d36-86f51a715889",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "import json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17305cf4-9615-475c-a747-c40e48354e8b",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "eval_tuned_path = 'eval_results_tuned.jsonl'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e0b791a-975e-4ec2-9268-281eb9377c37",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "eval_data_tuned = []"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ed93decd-c6ed-4746-bf9f-3351d414eb77",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "with open(eval_tuned_path) as f:\n",
    "    for line in f:\n",
    "        eval_data_tuned.append(json.loads(line))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "81f5f3d1-17dc-45f7-8b25-6fe3f80ca8e6",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "# Import for printing purposes\n",
    "from utils import print_d"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5193a7f7-a6ac-4369-9f39-c7176c8828b1",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "# Look at the result produced by the tuned model\n",
    "print_d(eval_data_tuned[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37199f28-7d52-4530-98e6-2ba2fe255d6e",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "eval_untuned_path = 'eval_results_untuned.jsonl'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6deecff9-c5ca-4f8c-802a-b76c8e3c3777",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "eval_data_untuned = []"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "57ede73e-dcf6-49b1-947a-2524815680cf",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "with open(eval_untuned_path) as f:\n",
    "    for line in f:\n",
    "        eval_data_untuned.append(json.loads(line))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1bb71791-0421-40da-868c-4c681e7d35ff",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "# Look at the result produced by the untuned model\n",
    "print_d(eval_data_untuned[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7dc805be",
   "metadata": {},
   "source": [
    "### Explore the results side by side in a dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a18d95c7-5317-41c7-b545-749bbc536faa",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "# Extract all the prompts\n",
    "prompts = [sample['inputs']['inputs_pretokenized']\n",
    "           for sample in eval_data_tuned]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "adddcb4a-ed37-4b2e-8b57-a455854f5b79",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "# Completions from the untuned model\n",
    "untuned_completions = [sample['prediction']\n",
    "                       for sample in eval_data_untuned]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b432b804-1214-45aa-b268-bcd11741b573",
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "# Completions from the tuned model\n",
    "tuned_completions = [sample['prediction']\n",
    "                     for sample in eval_data_tuned]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cec6f78",
   "metadata": {},
   "source": [
    "- Now putting all together in one big dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01e3fb37-02f6-4539-bf6f-ee7fc717e981",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c86c93a8-3b65-470f-b110-018aa5e6fa09",
   "metadata": {
    "height": 81
   },
   "outputs": [],
   "source": [
    "results = pd.DataFrame(\n",
    "    data={'prompt': prompts,\n",
    "          'base_model':untuned_completions,\n",
    "          'tuned_model': tuned_completions})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3df43314-62fe-4c0e-ad20-6e41bd9a3045",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "pd.set_option('display.max_colwidth', None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4b43249d-94a3-4252-9d12-29ad13e15c17",
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "# Print the results\n",
    "results"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
